Project Outline

This report is Part 3 in a five part series in which we are exploring and analyzing ocean buoy data collected from NOAA maintained National Data Buoy Center (NDBC) stations. In Part 1 we explored ocean current observations at the NDBC Station 46087 (Neah Bay Buoy) and compared them with ocean current forecasts from a third party. In Part 2 we took a look at meteorological (wind and wave) data from the Neah Bay Buoy and examined the potential for significant meteorological events to introduce noise in ocean current observations. Here in Part 3 we will introduce meteorological data for another location, NDBC Station 46088 (New Dungeness Buoy), and compare trends in wave height, period, and direction with those of the Neah Bay Buoy. We will attempt to highlight the relationship between swell events at the Neah Bay Buoy and swell events at the New Dungeness Buoy. In Part 4 we will walk through considerations and processes involved in training and testing a supervised ML model to predict the class of wave which might occur at the New Dungeness Buoy given conditions at the Neah Bay Buoy. In Part 5 we will put our final classifier model in production by supplying forecasted conditions for the Neah Bay Station and determining the predicted class of wave observed at the New Dungeness Station.

More detailed information regarding the NDBC, and the locations of buoys they maintain, can be found on their website.

Executive Summary: Part 3

In this report we examine trends in relationships between meteorological observations at the Neah Bay and New Dungeness NDBC Stations. We compare summary statistics for monthly and yearly aggregated observations, noting an overall smaller wave size, as well as an increase in summer-time wave heights at the New Dungeness Buoy when compared with the Neah Bay Buoy.

Next we focus on conditions at the New Dungeness Buoy, exploring the distribution of wave height observations faceted by swell type. We notice a seasonality to the groundswell and windswell activity, which as we recall is consistent with trends at the Neah Bay Buoy.

Then we look at the relationships between wave height and wave direction at the New Dungeness Buoy, faceted by swell type. We notice a common vein of wave direction from the SW and WSW directions. In addition, we notice windwave and chop swell types also show clustering from the ESE direction. We compare these wave characteristics with wind characteristics faceted along the same swell types, and notice the potential for strong correlation between local wind events and windwave and chop swell types.

Finally, we explore time series plots of wave heights at both NDBC Stations and see evidence of relationship between strong swell events at the Neah Bay Buoy and observations of groundswell at the New Dungeness Buoy. We drill down into the data to examine two specific swell events and further explore the potential for local wind events at the New Dungeness Buoy to ‘mask’ underlying groundswell conditions.

Data

The data used in this report was acquired through the NDBC website. Nicely formatted, yearly ‘.txt’ files are available for download for years 2004 to 2019, and some wrangling is necesseray. Issues regarding data quality include: the addition of the minute of observation column in 2005, a re-assignment of variable names beginning in 2007, several shifts in the frequency of recorded observations, as well as a considerable number of missing observations. After dealing with these data quality issues, I choose to engineer several new features including: id, dir, w_dir, and swell_type. Further definitions and descriptions for each field in the dataset can be found in the appendix of this report and on the NDBC’s measurement definitions webpage.

Summary Statistics

Here let’s explore aggregated information for both stations.

First, the New Dungeness Buoy:

Summary Statistics for all Months, NDBC Station 46088
Month Number of Observations Mean Wave Dir Mean Wave Height Mean APD Mean DPD Mean Wind Dir Mean WSPD Mean PRES Mean ATMP Mean WTMP
1 19711 85.77 0.38 3.18 2.60 140.26 5.09 1017.12 6.39 7.88
2 19116 86.99 0.35 3.01 2.43 160.78 4.98 1016.36 6.45 7.72
3 21081 102.17 0.36 3.08 2.64 186.61 4.85 1015.39 7.48 8.01
4 18909 112.06 0.35 3.03 2.48 216.28 5.04 1016.39 8.73 8.84
5 21077 127.97 0.33 2.94 2.32 229.63 5.00 1016.60 10.34 9.79
6 20613 140.32 0.34 2.96 2.44 233.82 5.23 1016.52 11.58 10.52
7 22392 155.48 0.36 2.88 2.80 236.13 5.43 1016.84 12.56 11.32
8 22670 131.87 0.30 2.84 2.35 231.26 4.77 1015.67 12.90 11.66
9 20872 83.31 0.21 3.13 1.83 217.41 3.38 1016.23 12.18 11.12
10 20651 78.30 0.25 3.13 2.27 183.48 3.63 1016.29 10.39 10.16
11 17203 94.33 0.36 3.06 2.91 158.14 4.89 1016.69 8.31 9.34
12 22631 84.70 0.38 2.83 2.67 146.16 5.46 1016.72 6.49 8.41
Summary Statistics for All Years, NDBC Station 46088
Year Number of Observations Mean Wave Dir Mean Wave Height Mean APD Mean DPD Mean Wind Dir Mean WSPD Mean PRES Mean ATMP Mean WTMP
2004 4224 192.89 0.38 3.30 4.89 195.36 4.39 1016.02 11.02 10.61
2005 15913 95.81 0.20 1.61 2.34 191.05 4.57 1015.47 9.99 10.12
2006 13612 100.93 0.23 1.71 2.39 208.24 5.14 1016.24 10.24 10.23
2007 17291 92.56 0.21 1.63 2.31 201.76 4.87 1017.19 9.07 8.78
2008 17361 78.63 0.24 2.03 1.90 199.60 4.95 1017.42 8.59 8.57
2009 13726 70.77 0.22 2.00 1.49 204.60 4.68 1017.03 9.68 9.41
2010 17215 128.43 0.44 3.81 3.08 192.22 5.13 1014.14 9.76 9.55
2011 17469 128.93 0.40 3.79 2.84 197.91 4.90 1016.53 8.78 9.05
2012 16521 128.10 0.42 3.74 2.92 198.15 5.03 1015.55 9.09 9.13
2013 15311 104.65 0.36 3.75 2.23 191.67 4.52 1018.34 8.98 8.99
2014 15143 118.24 0.40 3.68 2.66 186.20 4.81 1016.37 9.76 9.69
2015 15633 115.00 0.35 3.78 2.42 206.73 4.44 1017.00 10.53 10.26
2016 17470 132.16 0.43 3.78 3.08 197.24 5.03 1015.39 10.39 10.24
2017 13833 134.14 0.41 3.71 2.77 198.10 5.07 1015.84 9.46 9.58
2018 16208 109.88 0.37 3.32 2.53 193.19 4.92 1017.06 9.62 9.59
2019 19996 58.02 0.21 2.32 1.47 180.16 4.37 1016.59 9.35 9.53

And recall these statistics for the Neah Bay Buoy:

Summary Statistics for all Months, NDBC Station 46087
Month Number of Observations Mean Wave Dir Mean Wave Height Mean APD Mean DPD Mean Wind Dir Mean WSPD Mean PRES Mean ATMP Mean WTMP
1 17879 248.71 2.41 7.74 12.08 143.46 7.70 1017.37 6.83 8.34
2 16743 254.41 2.16 7.66 11.87 145.15 6.63 987.68 6.63 8.17
3 17354 258.87 2.02 7.54 11.41 164.79 5.92 1015.75 7.47 8.75
4 16846 265.48 1.97 7.55 11.25 187.24 5.23 1017.27 8.72 9.66
5 17933 267.55 1.53 6.89 9.81 208.42 3.85 1017.15 10.54 10.68
6 16917 269.30 1.39 6.65 9.12 215.32 3.23 1017.47 11.73 11.39
7 19932 271.54 1.27 6.46 8.82 218.95 2.91 1017.96 12.53 11.86
8 20044 275.57 1.25 6.49 8.53 210.02 2.79 1016.60 12.68 11.95
9 18949 271.16 1.50 7.11 9.72 172.20 3.43 1016.60 12.30 11.73
10 21140 262.29 1.94 7.63 11.00 146.57 5.33 1016.13 10.91 11.31
11 19899 256.23 2.31 7.47 11.15 151.80 7.05 1015.19 8.99 10.61
12 19187 258.39 2.51 7.72 12.02 144.63 7.25 1016.26 6.94 9.09
Summary Statistics for All Years, NDBC Station 46087
Year Number of Observations Mean Wave Dir Mean Wave Height Mean APD Mean DPD Mean Wind Dir Mean WSPD Mean PRES Mean ATMP Mean WTMP
2004 4218 266.35 1.89 7.22 10.42 171.05 4.44 1016.30 11.13 11.37
2005 16072 270.09 1.89 7.19 10.75 173.41 4.71 1015.66 10.07 10.72
2006 13157 267.43 1.93 7.04 10.24 190.36 4.75 980.79 10.20 10.72
2007 17072 262.73 2.03 7.38 10.66 177.64 5.21 1016.88 9.29 9.92
2008 17029 261.82 2.14 7.60 11.17 184.19 5.04 1016.93 8.51 9.31
2009 15434 268.64 1.74 7.09 10.26 191.26 4.95 1016.83 9.36 9.84
2011 12363 270.41 1.74 7.16 10.35 188.45 4.23 1017.68 10.65 10.29
2012 12853 264.33 1.89 7.20 10.38 177.13 5.29 1015.15 9.04 9.68
2013 10249 263.40 1.89 7.41 11.29 164.05 5.67 1020.69 7.57 8.68
2014 17405 261.32 1.80 7.09 10.29 176.30 5.59 1016.42 10.07 10.70
2015 17463 264.50 1.85 7.28 10.44 172.09 5.00 1016.68 10.68 11.20
2016 17451 260.11 2.02 7.46 10.74 169.52 5.36 1015.63 10.64 11.23
2017 17332 258.14 1.76 7.04 10.10 168.77 5.43 1015.88 9.60 10.41
2018 17420 264.24 1.79 7.18 10.32 175.46 5.16 1017.07 9.97 10.58
2019 17305 261.97 1.69 7.25 10.99 159.91 4.93 1016.22 10.04 10.64

Let’s use data visualization techniques to help us better understand monthly trends.

As we can see, the aggregated monthly averages differ considerably between the two stations. There are similarities in seasonal trends, with the exception of a notable spike in average wave heights during the summer months at the New Dungeness Buoy. As we will explore in more detail later in this report, wave heights and directions are strongly correllated with ‘local’ wind conditions at the New Dungeness Buoy.

Let’s quickly examine yearly aggregations for both stations:

Explore Distributions and Feature Relationships

Moving forward in our analysis we will be paying attention to wave size and period. The periods have been classified into five distinct groupings: groundswell, windswell, windwave, chop, and flat, corresponding to periods 13 seconds and greater, between 10 and 12 seconds, between 5 and 9 seconds, less than 4 seconds, and zero seconds with zero wave height.

Let’s start with a look at the distribution of wave height for the New Dungeness Buoy:

We can see the majority of observations are clustered at 0, and near 0.25 meters in height.

Let’s see how swell types play into this distribution:

Chop appears to be the larges class followed by flat, windwave, windswell, then groundswell.

Let’s take a look at monthly distributions to see if any seasonal trends are apparent:

Now let’s filter out flat, chop, and windswell to examine the monthly distributions of the under-represented classes:

It’s easy to see that summer months have far fewer groundswell and windswell observations, while winter months have more.

Let’s move on by examining the relationship between Wave Height and Mean Wave Direction at the New Dungeness Buoy:

Again, it’s easy to see the trends in wave height and wave direction with chop and windwaves, but what about the classes with fewer observations?

Let’s facet on swell type to find out:

All four swell types have a clustering around the SW/WSW directions, while windwave and chop have additional clusters around the ESE direction.

Now lets have a look at the wind data for the New Dungeness Buoy to see if there are any apparent relationships with the wave data.

This can be a tricky transition to comprehend, but we are looking at wind speeds vs wind directions, faceted on the wave type for the given wind observations. In comparison to the previous plot, we see parallell structure in the windwave and chop facets. This implies correlation between local wind conditions and the windwave and chop classes. In comparison to the previous plot, the groundswell and windswell facets are more spread out accross the spectrum of wind directions. We see less similarity in the structure of these two facets with the previous plot. This implies less correlation between local wind conditions and groundswell and windswell classes.

In addition, notice that very few windswell, and even fewer groundswell, observations occur with local wind conditions greater than 10 m/s.

Compare Time Series Plots

Consider this series of yearly plots showing the wave conditions for both stations:

If we look carefully we notice a trend where a cluster of groundswell observations at the New Dungeness Buoy, seem to correspond to an increase in wave magnitude (height and period) at the Neah Bay Buoy. In particular, compare the observations at both stations during November 2016, and also during April 2019.

First, let’s look at November 2016:

I see a cluster of groundswell readings at the New Dungeness Buoy around November 23rd to 24th, 2016. Let’s have a closer look:

Here color represents wave direction, with the shape of the point representing swell type. Recall that chop and windwaves at the New Dungeness Bouy are likely strongly correlated with local wind conditions. Let’s explore wind conditions for both stations on November 23rd & 24th, 2016:

There is a very obvious spike in local wind speeds at the New Dungeness Buoy during the morning of November 24th. This increase in wind strength corresponds to an increase in windwave size with a wave direction aligned with the wind direction. We can see this ‘local wind event’ results in the windwave class potentially masking any underlying long period swell at the New Dungeness Buoy.

Let’s explore one more instance of groundswell observations at the New Dungeness Buoy. Consider the wave observations, and concurrent wind conditions around April 5th to 10th, 2019, for both NDBC Stations:

Again we see correlation between ‘local wind events’ at the New Dungeness Buoy and spikes in observations of windwave and chop swell types with wave directions aligned with wind directions. Between these two wind events we see lighter wind observations and wave conditions at the New Dungeness Buoy more aligned with wave conditions at the Neah Bay Buoy.

Summary, Considerations, and Next Steps

We have explored meteorological conditions for both NDBC Stations and found relationship between strong wave events at the Neah Bay Buoy and observations of groundswell at the New Dungeness Buoy. In addition we have shown the likelihood for strong correlation between local wind events and observations of windwaves and chop at the New Dungeness Buoy. We have explored instances where these local wind events likely interferred with, or masked, recordings of underlying groundswell conditions.

Moving forward, our goal is to develop a supervised machine learning model to predict the swell type at the New Dungeness Buoy, given conditions at the Neah Bay Buoy.

In preparation for this task, it will be necessary to translate swell type labels from the New Dungeness Buoy to observations at the Neah Bay Buoy. Consider that an observation on the New Dungeness Buoy doesn’t translate to an observation at the exact same time on the Neah Bay Buoy. One important step will be quantifying the time-shift in this translation. Wave speeds are a function of their period, not to mention the treadmill like affect of ocean current. Additionally, wave direction will be another important factor to consider in quantifying this translation of observations from the New Dungeness Buoy to the Neah Bay Buoy.

Furthermore, in order to supply our model with the most accurate data, it may be necessary to subset our data to only include observations of the label corresponding to less windy conditions at the New Dungeness Buoy. Setting a threshold around 10 m/s for winds at the New Dungeness Buoy may allow for the maximized intersection of data quantity and accuracy of observations.

Apendix

Data Definitions

Here we will walk through a definition and short description for each field in the dataset:

  • id indicates the location. NDBC Station ID 46087 refers to the Neah Bay Buoy and 46088 refers to the New Dungeness Buoy.
  • Date_Time is the year, month, day, and time of the recorded observation. Observations are recorded twice hourly then stored in GMT/UTC timezone by the NDBC.
  • WVHT is defined by the NDBC website as, “Significant wave height (meters) is calculated as the average of the highest one-third of all of the wave heights during the 20-minute sampling period.”
  • MWD is defined by the NDBC website as, “The direction from which the waves at the dominant period (DPD) are coming. The units are degrees from true North, increasing clockwise, with North as 0 (zero) degrees and East as 90 degrees.”
  • dir is a feature I engineered using the data from MWD. Values follow the standard notation for cardinal direction, more information on cardinal direction can be found here.
  • swell_type is a feature I engineered using DPD. Values indicate whether a given observation is classified as ‘groundswell’, having a dominant wave period of greater than or equal to 13 seconds, or ‘windswell’, having a dominant wave period of less than 13 seconds but greater than or equal to 10 seconds, or ‘windwave’, having a dominant period less than 10 seconds but greater than 4 seconds, or ‘chop’, having a dominant period 4 seconds or smaller, or ‘flat’, having dominant period equal to 0 with a wave height of 0.
  • DPD is defined by the NDBC website as, “Dominant wave period (seconds) is the period with the maximum wave energy.”
  • APD is defined by the NDBC website as, “Average wave period (seconds) of all waves during the 20-minute period.”
  • WDIR is defined by the NDBC website as, “Wind direction (the direction the wind is coming from in degrees clockwise from true N) during the same period used for WSPD.”
  • w_dir is a feature I engineered using the datat from WDIR and the same value definitions as dir.
  • WSPD is defined by the NDBC website as, “Wind speed (m/s) averaged over an eight-minute period for buoys.”
  • GST is defined by the NDBC website as, “Peak 5 or 8 second gust speed (m/s) measured during the eight-minute or two-minute period. The 5 or 8 second period can be determined by payload.”
  • PRES is defined by the NDBC website as, “Sea level pressure (hPa).”
  • ATMP is defined by the NDBC website as, “Air temperature (Celsius).”
  • WTMP is defined by the NDBC website as, “Sea surface temperature (Celsius). For buoys the depth is referenced to the hull’s waterline.”
  • DEWP is defined by the NDBC website as, “Dewpoint temperature taken at the same height as the air temperature measurement.”

Further details regarding measurement techniques utilized by the NDBC can be found here.